Model Selection

Visual language understanding

# Visual language understanding

Gemma 3 4b It Qat GGUF

Gemma 3 is a lightweight, advanced open model series from Google, built on the same research and technology used to create Gemini models. This model is multimodal, capable of processing both text and image inputs to generate text outputs.

Text-to-Image English

VL Rethinker 7B Mlx 4bit

VL-Rethinker-7B 4-bit MLX Quantized Version is a quantized variant of the TIGER-Lab/VL-Rethinker-7B model, optimized for Apple devices and supporting visual question-answering tasks.

Text-to-Image English

Qwen Qwen2.5 VL 32B Instruct GGUF

Qwen2.5-VL-32B-Instruct is a multimodal vision-language model with a parameter scale of 32B, supporting image understanding and text generation tasks.

Text-to-Image English

Qwen2 Vl 7b Rslora Offensive Meme Singapore

A visual language model for classifying offensive emojis in the Singapore context, fine-tuned based on Qwen2-VL-7B-Instruct

Multimodal Fusion

Transformers English

Mulberry Qwen2vl 7b

The Mulberry model is a step-by-step reasoning-based model trained on the Mulberry - 260K SFT dataset generated through collective knowledge search.

The Magician is the first multi-modal large language model with free-form multi-image localization capabilities, achieving precise localization in complex multi-image scenarios and outperforming models with a scale of 70B in performance.

Transformers English

Open LLaVA NeXT LLaMA3 8B

An open-source chatbot model trained by fine-tuning the entire model on open-source data, which can be used for research on multimodal models and chatbots.

Share4oReasoning

Qwen2 VL 7B Instruct GGUF

Qwen2-VL-7B-Instruct is a multimodal vision-language model that supports the joint understanding and generation of images and text.

Transformers English

GLM-Edge-V-5B is a 5-billion-parameter multimodal model that supports image and text inputs, capable of performing image understanding and text generation tasks.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase